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IN THE CLAIMS : 

Listing of the Claims : 

1. (Canceled) A method for identifying a novel nucleic acid molecule encoding a protein 
of interest comprising: 

(i) selecting a specific protein from a first species involved in a regulatory 
network of interest; 

(ii) identifying known proteins that act upstream and downstream in the 
regulatory network of interest with respect to the specific protein selected; 

(iii) constructing the regulatory network of interest from the proteins identified 
in step (ii); 

(iv) for each identified protein, select a domain or motif and search by 
homology for related proteins in a second species, wherein a related 
protein is defined as a protein having a homologous domain or motif; 

(v) producing a regulatory network for the second species, wherein said 
regulatory network incorporates the identified related proteins; 

(vi) comparing the regulatory network from the first species to the regulatory 
network of said second species; 

(vii) identifying a protein present in a regulatory network for one species but 
absent in the regulatory network of the other species; and 

(viii) isolating a nucleic acid molecule encoding the protein identified in step (v) 
in the species in which it is missing. 
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2. (Canceled) The method of Claim 1 wherein the nucleic acid molecule encodes human 

protein. 

3. (Canceled) The method of claim 1 wherein the related proteins are orthologs. 

4. (Canceled) The method of claim 1 wherein the regulatory pathway is involved in 
apoptosis. 

5. (Canceled) The method of claim 1 wherein the specific protein from the first species 
is involved in tumor suppression. 

6. (Canceled) A method for identifying the affect of a gene knockout on a regulatory 
pathway comprising the following steps: 

(i) identification of the shortest non-oriented pathway connecting two gene 
products; 

(ii) assigning an initial sign value of "-" to the knockout since the knockout 
gene product is inactive; 

(iii) moving along the shortest pathway between the two gene products 
multiplying the sign with the sign of the next gene product in the pathway, 
wherein "-" stands for inhibition, "+" stands for induction or activation, 
and "0" stands for the lack of interaction between two proteins in the 
specified direction; and 

(iv) determining the final sign at the end of the pathway, wherein "-" indicates 
inhibition and "+" indicates induction or activation of the pathway. 
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7. (Canceled) A method for identifying a novel nucleic acid molecule encoding a protein 
of interest comprising: 

(i) selecting a gene of interest and searching a database for homologous 
sequences; 

(ii) aligning the homologous sequences identified in step (i); 

(iii) constructing a gene tree using the sequence alignment; 

VjwY ./?or»ctn.mti , n.o , ..a_cn*?^ipc..trpp*. 

= - (v) — imputing4he species tree and gene tree into an algorithm which integrates - - - 

the species tree and the gene tree into a reconciled tree; and 
(vi) identifying orthologous genes present in one species but missing in 
another. 

8. (Canceled) The method of claim 7 wherein the following algorithm is used to 
integrate the species tree and the gene tree into a reconciled tree: 

(i) computing the similarity for each pair of interior nodes from trees Tg and 
Ts, 

(ii) finding the maximum; 

(iii) saving Sgi as a new cluster of orthologs, save {Sgi} - {Ssj} as a set of 
species that are likely to have gene of this kind (or lost it in evolution); 

(iv) eliminating Sgi from Tg; Tg: = Tg\Sgi; 

(v) repeating step (ii)-(iv) until Tg is non-empty. 

9. (Canceled) A method for identifying a novel gene comprising the following steps: 
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(i) defining a motif or domain composition of a gene of interest; 

(ii) searching for sequences which correspond to nucleotide sequences in an 
expression sequence tag database or other cDNA databases using a 
program such as BLAST and retrieving the identified sequences; 

(iii) searching additional databases for expressed sequence tags containing 
the domains and motifs characteristic for the gene of interest with Hidden 
Markov Model of domains and motifs identified in step (i); 

(iv) identifying nucleotide.sequences comprising the gene of interest. -.=■ 

10. (Canceled) The method of claim 9 further comprising using each identified 
expression sequence tag to search sequence databases for overlapping sequences for the purpose 
of assembling longer overlapping stretches of DNA. 

1 1 . (Currently Amended) A computerized method for extracting information on 
interactions between biological entities from natural-language genomics text data, comprising: 

(i) parsing the genomics text data to determine the grammatical structure of 
the text data ;m& 

(ii) regularizing the parsed text data to form structured word terms T; and 

(iii) extracting interactions between biological entities from natural-lanRuaRe 
genomics text data. 

12. (Original) The method according to claim 11, further comprising preprocessing the 
data prior to parsing, with preprocessing comprising the step of identifying biological entities. 
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13. (Original) The method according to claim 11, further comprising referring to an 
additional parameter which is indicative of the degree to which subphrase parsing is to be carried 
out. 

14. (Original) The method according to claim 11, wherein said parsing step further 
comprises segmenting the text data by sentences. 

15. (Original) The method according to claim 11, wherein said parsing step further 
comprises: 

segmenting the text data by sentences; and 

segmenting each of the sentences at identified words or phrases. 

16. (Original) The method according to claim 11, wherein said parsing step further 
comprises: 

segmenting the text data by sentences; and 
segmenting each of the sentences at a prefix, 

17. (Original) The method according to claim 11, wherein said parsing step further 
comprises skipping undefined words. 

18. (Original) The method according to claim 11, wherein said parsing step further 
comprises: 

identifying one or more binary actions and their relationships; and 
identifying one or more arguments associated with the actions. 
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19. (Currently Amended) A computerized method for extracting information on 
interactions between biological entities from natural-language genomics text data, comprising: 

(i) parsing the genomics text data to determine the grammatical structure of 
the text data wherein if said parsing of the test data is unsuccessful error 
recovery is performed;-and 

(ii) regularizing the parsed text data to form structured word terms T; and 
• (-iii-l extracting in^ 

genomics text data. • 

20. (Original) The method according to claim 19, wherein said error recovery step 
comprises: 

segmenting the text data; and 

analyzing the segmented text data to achieve at least a partial parsing of the 
unsuccessfully parsed text data. 

2 1 . (Currently Amended) A computerized method for extracting information on 
interactions between biological entities from natural-language genomics text data, comprising: 

(i) parsing the genomics text data to determine the grammatical structure of 
the text data; 

(ii) regularizing the parsed text data to form structured word terms; and 

(iii) tagging the text data with a structured data component derived from the 
structured word terms wherein said tagging step continues comprises 
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providing the structured data component in a Standard Generalized 
Markup Language (SGML) compatible format s and 
(iv) extracting interactions between biological entities from natural-language 
genomics text data, 

22. (Canceled) A computer system for extracting information on biological entities from 
natural-language text data, comprising: 

(i) means for parsing the natural-language text data; and 

(ii) means for regularizing the parsed text data to form structured word terms. 

23. (Canceled) The system according to claim 22, further comprising means for 
preprocessing the data prior to parsing, with the preprocessing means comprising identifying 
biological entities. 

24. (Canceled) The system according to claim 22, further comprising means for 
referring to an additional parameter which is indicative of the degree to which subphrase parsing 
is to be carried out. 

25. (Canceled) The system according to claim 22, wherein said parsing means further 
comprises means for segmenting the text data by sentences. 

26. (Canceled) The system according to claim 22, wherein said parsing means further 
comprises: 

means for segmenting the text data by sentences; and 

means for segmenting each of the sentences at identified words or phrases. 
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27. (Canceled) The system according to claim 22, wherein said parsing means further 
comprises: 

means for segmenting the text data by sentences; and 
means for segmenting each of the sentences at a prefix. 

28. (Canceled) The system according to claim 22, wherein said parsing means further 
comprises means for skipping undefined words. 

29. (Canceled) The system according to claim 22, wherein said parsing means further 
comprises: 

means for identifying one or more binary actions and their relationships; and 
means for identifying one or more arguments associated with the actions. 

30. (Canceled) The system according to claim 22, further comprising means for 
performing error recovery when parsing of the text data is unsuccessful. 

31. (Canceled) The system according to claim 22, wherein said error recovery means 
comprises: 

means for segmenting the text data; and 

means for analyzing the segmented text data to achieve at least a partial parsing of 
the unsuccessfully parsed text data. 

32. (Canceled) The system according to claim 22, wherein said tagging means 
comprises means for providing the structured data component in a Standard Generalized Markup 
Language (SGML) compatible format. 
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